By optimizing the rate-distortion-realism trade-off, generative compression approaches produce detailed, realistic images, even at low bit rates, instead of the blurry reconstructions produced by rate-distortion optimized models. However, previous methods do not explicitly control how much detail is synthesized, which results in a common criticism of these methods: users might be worried that a misleading reconstruction far from the input image is generated. In this work, we alleviate these concerns by training a decoder that can bridge the two regimes and navigate the distortion-realism trade-off. From a single compressed representation, the receiver can decide to either reconstruct a low mean squared error reconstruction that is close to the input, a realistic reconstruction with high perceptual quality, or anything in between. With our method, we set a new state-of-the-art in distortion-realism, pushing the frontier of achievable distortion-realism pairs, i.e., our method achieves better distortions at high realism and better realism at low distortion than ever before.
translated by 谷歌翻译
Denoising diffusions are state-of-the-art generative models which exhibit remarkable empirical performance and come with theoretical guarantees. The core idea of these models is to progressively transform the empirical data distribution into a simple Gaussian distribution by adding noise using a diffusion. We obtain new samples whose distribution is close to the data distribution by simulating a "denoising" diffusion approximating the time reversal of this "noising" diffusion. This denoising diffusion relies on approximations of the logarithmic derivatives of the noised data densities, known as scores, obtained using score matching. Such models can be easily extended to perform approximate posterior simulation in high-dimensional scenarios where one can only sample from the prior and simulate synthetic observations from the likelihood. These methods have been primarily developed for data on $\mathbb{R}^d$ while extensions to more general spaces have been developed on a case-by-case basis. We propose here a general framework which not only unifies and generalizes this approach to a wide class of spaces but also leads to an original extension of score matching. We illustrate the resulting class of denoising Markov models on various applications.
translated by 谷歌翻译
线性系统的迭代求解器是部分微分方程(PDE)的数值解的关键组件。过去几十年来一直进行了深入的研究,例如雅各比,高斯 - 塞德尔,共轭梯度,跨部方法及其更高级的变体,但仍有迫切需要开发更快,更强大和更可靠的求解器。基于操作员回归的科学深度学习的最新进展,我们提出了一种提示,即用于微分方程的混合,迭代,数值和可转移的求解器。提示结合了标准放松方法和深层操作员网络(DeepOnet)。与标准数值求解器相比,提示能够为宽类微分方程提供更快的解决方案,同时保留接近机器零的精度。通过本本征分析,我们发现提示中的单个求解器靶向本征谱系中的不同区域,从而导致均匀的收敛速率,从而使混合求解器的整体表现出色。此外,提示适用于多维方程,并且在计算域和可转移到不同离散化方面具有灵活性。
translated by 谷歌翻译
对语音增强系统的培训通常不会纳入人类感知的知识,因此可能导致不自然的声音结果。通过预测网络将精神上动机的语音感知指标纳入模型培训的一部分,最近引起了人们的兴趣。但是,此类预测因子的性能受到培训数据中出现的度量分数的分布的限制。在这项工作中,我们提出了Metricgan +/-(Metricgan+的扩展,一个这样的度量动机系统),该系统引入了一个额外的网络 - 一个“脱发器”,该网络试图改善预测网络的稳健性(并通过扩展。发电机)通过确保观察训练中更广泛的度量得分。VoiceBank数据集的实验结果显示,PESQ得分的相对改善为3.8%(3.05 vs 3.22 PESQ得分),以及更好地概括对看不见的噪音和语音。
translated by 谷歌翻译
我们研究了设计AI代理商的问题,该代理可以学习有效地与潜在的次优伴侣有效合作,同时无法访问联合奖励功能。这个问题被建模为合作焦论双代理马尔可夫决策过程。我们假设仅在游戏的Stackelberg制定中的两个代理中的第一个控制,其中第二代理正在作用,以便在鉴于第一代理的政策给出预期的效用。第一个代理人应该如何尽快学习联合奖励功能,因此联合政策尽可能接近最佳?在本文中,我们分析了如何在这一交互式的两个代理方案中获得对奖励函数的知识。我们展示当学习代理的策略对转换函数有显着影响时,可以有效地学习奖励功能。
translated by 谷歌翻译
深度神经网络在计算机视觉中的许多任务中设定了最先进的,但它们的概括对象扭曲的能力令人惊讶地是脆弱的。相比之下,哺乳动物视觉系统对广泛的扰动是强大的。最近的工作表明,这种泛化能力可以通过在整个视觉皮层中的视觉刺激的表示中编码的有用的电感偏差来解释。在这里,我们成功利用了多任务学习方法的这些归纳偏差:我们共同训练了深度网络以进行图像分类并预测猕猴初级视觉皮层(V1)中的神经活动。我们通过测试其对图像扭曲的鲁棒性来衡量我们网络的分发广泛性能力。我们发现,尽管在训练期间没有这些扭曲,但猴子V1数据的共同训练导致鲁棒性增加。此外,我们表明,我们的网络的鲁棒性非常接近Oracle网络的稳定性,其中架构的部分在嘈杂的图像上直接培训。我们的结果还表明,随着鲁布利的改善,网络的表示变得更加大脑。使用新颖的约束重建分析,我们调查了我们的大脑正规网络更加强大的原因。与我们仅对图像分类接受培训的基线网络相比,我们的共同训练网络对内容比噪声更敏感。使用深度预测的显着性图,用于想象成像图像,我们发现我们的猴子共同训练的网络对场景中的突出区域倾向更敏感,让人想起V1在对象边界的检测中的作用和自下而上的角色显着性。总体而言,我们的工作扩大了从大脑转移归纳偏见的有前途的研究途径,并为我们转移的影响提供了新的分析。
translated by 谷歌翻译
Existing automated techniques for software documentation typically attempt to reason between two main sources of information: code and natural language. However, this reasoning process is often complicated by the lexical gap between more abstract natural language and more structured programming languages. One potential bridge for this gap is the Graphical User Interface (GUI), as GUIs inherently encode salient information about underlying program functionality into rich, pixel-based data representations. This paper offers one of the first comprehensive empirical investigations into the connection between GUIs and functional, natural language descriptions of software. First, we collect, analyze, and open source a large dataset of functional GUI descriptions consisting of 45,998 descriptions for 10,204 screenshots from popular Android applications. The descriptions were obtained from human labelers and underwent several quality control mechanisms. To gain insight into the representational potential of GUIs, we investigate the ability of four Neural Image Captioning models to predict natural language descriptions of varying granularity when provided a screenshot as input. We evaluate these models quantitatively, using common machine translation metrics, and qualitatively through a large-scale user study. Finally, we offer learned lessons and a discussion of the potential shown by multimodal models to enhance future techniques for automated software documentation.
translated by 谷歌翻译
View-dependent effects such as reflections pose a substantial challenge for image-based and neural rendering algorithms. Above all, curved reflectors are particularly hard, as they lead to highly non-linear reflection flows as the camera moves. We introduce a new point-based representation to compute Neural Point Catacaustics allowing novel-view synthesis of scenes with curved reflectors, from a set of casually-captured input photos. At the core of our method is a neural warp field that models catacaustic trajectories of reflections, so complex specular effects can be rendered using efficient point splatting in conjunction with a neural renderer. One of our key contributions is the explicit representation of reflections with a reflection point cloud which is displaced by the neural warp field, and a primary point cloud which is optimized to represent the rest of the scene. After a short manual annotation step, our approach allows interactive high-quality renderings of novel views with accurate reflection flow. Additionally, the explicit representation of reflection flow supports several forms of scene manipulation in captured scenes, such as reflection editing, cloning of specular objects, reflection tracking across views, and comfortable stereo viewing. We provide the source code and other supplemental material on https://repo-sam.inria.fr/ fungraph/neural_catacaustics/
translated by 谷歌翻译
In large-scale machine learning, recent works have studied the effects of compressing gradients in stochastic optimization in order to alleviate the communication bottleneck. These works have collectively revealed that stochastic gradient descent (SGD) is robust to structured perturbations such as quantization, sparsification, and delays. Perhaps surprisingly, despite the surge of interest in large-scale, multi-agent reinforcement learning, almost nothing is known about the analogous question: Are common reinforcement learning (RL) algorithms also robust to similar perturbations? In this paper, we investigate this question by studying a variant of the classical temporal difference (TD) learning algorithm with a perturbed update direction, where a general compression operator is used to model the perturbation. Our main technical contribution is to show that compressed TD algorithms, coupled with an error-feedback mechanism used widely in optimization, exhibit the same non-asymptotic theoretical guarantees as their SGD counterparts. We then extend our results significantly to nonlinear stochastic approximation algorithms and multi-agent settings. In particular, we prove that for multi-agent TD learning, one can achieve linear convergence speedups in the number of agents while communicating just $\tilde{O}(1)$ bits per agent at each time step. Our work is the first to provide finite-time results in RL that account for general compression operators and error-feedback in tandem with linear function approximation and Markovian sampling. Our analysis hinges on studying the drift of a novel Lyapunov function that captures the dynamics of a memory variable introduced by error feedback.
translated by 谷歌翻译
In robust Markov decision processes (MDPs), the uncertainty in the transition kernel is addressed by finding a policy that optimizes the worst-case performance over an uncertainty set of MDPs. While much of the literature has focused on discounted MDPs, robust average-reward MDPs remain largely unexplored. In this paper, we focus on robust average-reward MDPs, where the goal is to find a policy that optimizes the worst-case average reward over an uncertainty set. We first take an approach that approximates average-reward MDPs using discounted MDPs. We prove that the robust discounted value function converges to the robust average-reward as the discount factor $\gamma$ goes to $1$, and moreover, when $\gamma$ is large, any optimal policy of the robust discounted MDP is also an optimal policy of the robust average-reward. We further design a robust dynamic programming approach, and theoretically characterize its convergence to the optimum. Then, we investigate robust average-reward MDPs directly without using discounted MDPs as an intermediate step. We derive the robust Bellman equation for robust average-reward MDPs, prove that the optimal policy can be derived from its solution, and further design a robust relative value iteration algorithm that provably finds its solution, or equivalently, the optimal robust policy.
translated by 谷歌翻译